head(oasis_dice)
## ...1 participant_id session_id dice_bet_probability darq_probability
## 1 0 sub-OASIS10002 ses-M00 0.8926427 0.8916364312
## 2 1 sub-OASIS10010 ses-M00 0.8117313 0.0001414413
## 3 2 sub-OASIS10011 ses-M00 0.9305678 0.9999741316
## 4 3 sub-OASIS10037 ses-M00 0.9230336 0.6704366803
## 5 4 sub-OASIS10042 ses-M00 0.8931396 0.9995048046
## 6 5 sub-OASIS10045 ses-M00 0.9522836 0.9449698329
## darq_pass dice_hd_bet_probability mutual_info correlation_ratio norm_mi
## 1 TRUE 0.9095625 0.8533593 0.7847766 0.2011821
## 2 FALSE 0.8299690 0.7254140 0.6766012 0.1741843
## 3 TRUE 0.9405499 0.8728209 0.8159280 0.2073242
## 4 TRUE 0.9128021 0.7606033 0.7152034 0.1821181
## 5 TRUE 0.9225100 0.7154123 0.7272557 0.1761768
## 6 TRUE 0.9409483 0.7981154 0.7787458 0.1938449
## correlation_coef cr_l1
## 1 0.6140392 0.6578466
## 2 0.4409037 0.5621014
## 3 0.6240867 0.6902325
## 4 0.5447207 0.6028452
## 5 0.6027540 0.6089401
## 6 0.6402254 0.6620776
The scatter plots illustrate the relationship between each metric and DARQ probability.
The correlation between the two metrics is 0.6859204
The correlation between the two metrics is 0.7552805
The correlation between the two metrics is 0.5150382
The correlation between the two metrics is 0.7392367
The correlation between the two metrics is 0.6330968
The correlation between the two metrics is 0.7570659
The correlation between the two metrics is 0.7549297
This figure shows how these methods are correlated to one another. In general, everything is positively correlated meaning that they are capturing similar information. The darker squares, however, illustrates the metrics that cluster together based on their similarity.
corrplot::corrplot(cor(oasis_dice_num), method = "shade", order = "hclust", hclust.method = "ward.D2")
To explore the relationship and compare between the metrics, we first look at their distributions, then perform a principal component analysis (PCA) on all metrics.
The distribution showed that DARQ has very different distribution as compared to other metrics as it tends to give extreme scores.
oasis_dice_num %>% pivot_longer(colnames(oasis_dice_num)) %>%
ggplot(aes(x = value)) + geom_histogram(bins = 25) + facet_wrap(~name)
We performed a PCA on all metrics aside from DARQ to examine the relationships between them. Because these metrics are all between 0 and 1, we performed to run a non-centered-non-scaled PCA to keep the variance and the values of the metrics. This PCA is equivalent to performing a singular value decomposition directly on the data.
The row factor scores (participants) showed a separation roughly matching the pass/fail outcomes from DARQ. In the figure, GREEN dots indicate passing participants, and RED dots indicate failing ones.
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
From the column factor scores (metrics), we identified three clusters
of metrics. To narrow it down to few metrics, we chose
dice_hd_bet_probability which we think is the most
meaningful to include, and correlation_coef which is less
related to the other metrics (as indicated by large angle between all
metrics and itself).
From dice_hd_bet_probability and
correlation_coef, we examined their relationships and try
to derive different QC groups.
We first checked the distribution of the two metrics and used these distribution to identify a cut-off point. For DICE_HD_BET, we set it to 0.89, and for correlation coefficient, we set it to 0.5 (indicated by the red lines).
FinMetric <- oasis_dice %>% select(dice_hd_bet_probability, correlation_coef) %>% as.data.frame
rownames(FinMetric) <- oasis_dice$participant_id
FinMetric %>% ggplot(aes(dice_hd_bet_probability)) + geom_histogram(bins = 30) + geom_vline(xintercept = 0.89, color = "red", lwd = 2)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## i Please use `linewidth` instead.
FinMetric %>% ggplot(aes(correlation_coef)) + geom_histogram(bins = 25) + geom_vline(xintercept = 0.5, color = "red", lwd = 2)
We plotted the scatter plot between the two current metrics with their newly-derived pass/fail outcomes.
The correlation between the two metrics is 0.8652564
Pass/Fail_corr_coef)_(Pass/Fail_dice_hd_bet)FinMetric %>% filter(both_group == "TRUE_TRUE") %>% rownames
## [1] "sub-OASIS10002" "sub-OASIS10011" "sub-OASIS10037" "sub-OASIS10042"
## [5] "sub-OASIS10045" "sub-OASIS10051" "sub-OASIS10060" "sub-OASIS10062"
## [9] "sub-OASIS10063" "sub-OASIS10071" "sub-OASIS10081" "sub-OASIS10103"
## [13] "sub-OASIS10112" "sub-OASIS10117" "sub-OASIS10126" "sub-OASIS10136"
## [17] "sub-OASIS10142" "sub-OASIS10150" "sub-OASIS10151" "sub-OASIS10156"
## [21] "sub-OASIS10167" "sub-OASIS10183" "sub-OASIS10184" "sub-OASIS10185"
## [25] "sub-OASIS10195" "sub-OASIS10207" "sub-OASIS10218" "sub-OASIS10260"
## [29] "sub-OASIS10261" "sub-OASIS10266" "sub-OASIS10280" "sub-OASIS10287"
## [33] "sub-OASIS10294" "sub-OASIS10303" "sub-OASIS10309" "sub-OASIS10310"
## [37] "sub-OASIS10316" "sub-OASIS10318" "sub-OASIS10323" "sub-OASIS10329"
## [41] "sub-OASIS10333" "sub-OASIS10342" "sub-OASIS10348" "sub-OASIS10357"
## [45] "sub-OASIS10363" "sub-OASIS10380" "sub-OASIS10388" "sub-OASIS10396"
## [49] "sub-OASIS10406" "sub-OASIS10413" "sub-OASIS10415" "sub-OASIS10420"
## [53] "sub-OASIS10423" "sub-OASIS10430" "sub-OASIS10432" "sub-OASIS10435"
## [57] "sub-OASIS10439" "sub-OASIS10446"
FinMetric %>% filter(both_group == "FALSE_FALSE") %>% rownames
## [1] "sub-OASIS10010" "sub-OASIS10110" "sub-OASIS10162" "sub-OASIS10179"
## [5] "sub-OASIS10199" "sub-OASIS10222" "sub-OASIS10223" "sub-OASIS10227"
## [9] "sub-OASIS10373" "sub-OASIS10398" "sub-OASIS10400"
FinMetric %>% filter(both_group == "TRUE_FALSE") %>% rownames
## [1] "sub-OASIS10114" "sub-OASIS10291" "sub-OASIS10377"
FinMetric %>% filter(both_group == "FALSE_TRUE") %>% rownames
## [1] "sub-OASIS10084" "sub-OASIS10155" "sub-OASIS10180" "sub-OASIS10371"
## [5] "sub-OASIS10372" "sub-OASIS10402"